Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add batch_bytes configuration for Flint #329

Merged
merged 5 commits into from
May 3, 2024

Conversation

penghuo
Copy link
Collaborator

@penghuo penghuo commented May 2, 2024

Description

Change the default value

  • spark.datasource.flint.write.batch_size: 1000 --> Integer.MAX_VALUE.
  • spark.datasource.flint.write.refresh_policy: wait_for --> false.

Add new settings

  • spark.datasource.flint.write.batch_bytes: The approximately amount of data in bytes written to Flint in a single batch request. The actual data write to OpenSearch may more than it. Default value is 1mb. The writing process checks after each document whether the total number of documents (docCount) has reached batch_size or the buffer size has surpassed batch_bytes. If either condition is met, the current batch is flushed and the document count resets to zero.

Test Result

With EMR-S 3 Executors (4v CPU 16 GB memory), write to single node OpenSearch cluster, throughput is 40MB/s.

Issues Resolved

#304

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

penghuo added 4 commits May 1, 2024 17:21
Signed-off-by: Peng Huo <[email protected]>
Signed-off-by: Peng Huo <[email protected]>
@penghuo penghuo marked this pull request as ready for review May 2, 2024 17:32
@penghuo penghuo self-assigned this May 2, 2024
@penghuo penghuo added the 0.4 label May 2, 2024
@penghuo penghuo changed the title add batch_bytes for FlintWriter add batch_bytes configuration for Flint May 2, 2024
Signed-off-by: Peng Huo <[email protected]>
Copy link
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes!

@penghuo penghuo merged commit d9c0ba8 into opensearch-project:main May 3, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants